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Abstract 


This  paper  surveys  a  variety  of  recent  results  addressing  the  problem  of  range 
queries  in  computational  geometry.  The  major  contribution  of  this  paper  is  in  iden¬ 
tifying  three  general  methods  for  range  queries  in  computational  geometry  and  in 
classifying  many  of  the  recent  results  into  one  or  more  of  these  approaches.  The 
three  methods  discussed  in  this  paper  are  random  sampling,  search-tree  tables,  and 
space-partition  trees.  This  survey  assumes  some  familiarity  with  basic  computational 
geometry  concepts  cind  techniques. 


1  Introduction 

A  widely  studied  area  of  algorithmic  design  concerns  the  construction  of  efficient  data 
structures  for  maintaining  a  database  of  records  that  supports  various  types  of  queries.  A 
variety  of  sophisticated  data  structures  and  algorithmic  techniques  for  database  queries 
have  been  recently  developed. 

A  rather  general  class  of  database  queries  is  the  class  of  range  queries.  A  range  query 
is  a  request  to  identify  the  records  in  a  database,  with  keys  falling  in  a  certain  range. 
When  each  record  has  one  key,  there  are  various  schemes  for  efficiently  processing  such 
range  queries  (balanced  binary  trees  are  a  typical  example).  The  problem  becomes  more 
complicated,  however,  when  records  have  d  >  I  keys  and  range  queries  involve  general 
ranges  in  the  d-dimensional  space  of  keys.  Multidimensional  range  queries,  and  in  par¬ 
ticular  orthogonal  range  queries,  where  ranges  are  generalized  rectangles  in  d  dimensions, 
received  a  considerable  amount  of  attention  (see  (2,  3,  4,  5,  19,  22,  23,  24,  28,  29]). 

This  research  was  supported  in  part  by  the  Defense  Advance  Research  Projects  Agency  under  Contract 
N00014-87-K-0825. 
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Recently,  the  problem  of  multidimensional  range  queries  was  also  intetisively  studied 
in  the  domain  of  computational  geometry.  The  problem  of  range  queries  in  computational 
geometry  can  be  informally  described  as  follows:  given  a  collection  of  n  geometrical  objects 
in  the  d-dimensional  Euclidean  space,  store  them  in  a  data  structure,  such  that  when  given 
a  range  (subset)  of  the  d-dimensional  Euclidean  space,  the  objects  falling  in  that  range 
can  be  quickly  identified.  The  n  geometrical  objects  are  considered  to  be  the  records  of 
a  database,  and  the  subset  of  the  space  is  the  query  range.  A  formal  definition  of  range 
queries  is  given  in  Section  3. 

Throughout  this  paper  we  consider  a  prototypical  example,  the  half-space  range  query 
problem.  There  are  two  important  variants  of  this  problem,  half-space  counting  and  half¬ 
space  reporting.  The  half-space  counting  problem  can  be  stated  as  follows:  given  n  points 
in  the  d-dimensional  Euclidean  space  organize  them  in  a  data  structure,  such  that 
the  number  of  points  in  a  query  half-space  can  be  quickly  determined.  The  half-space 
reporting  problem  is  similar,  except  that  the  points  in  the  query  half-space  need  to  be 
actually  reported.  Other  common  variants  involve  checking  whether  there  exists  a  point 
that  falls  in  the  query  half-space,  checking  whether  all  the  points  fall  in  the  query  half¬ 
space,  and  finding  a  point  that  minimizes  a  given  function  with  respect  to  the  query 
half-space.  In  Section  3,  we  define  these  types  of  queries  more  precisely 

Ranges  in  computational  geometry  are  usually  defined  by  analytic  curves,  and  can 
thus  be  quite  general.  However,  efficient  algorithms  and  data  structures  for  processing 
the  corresponding  range  queries  are  known  only  for  some  specific  types  of  ranges.  These 
types  of  ranges  include  half-spaces,  hyperrectangular  (orthogonal)  ranges,  hyperspherical 
ranges,  simplicial  ranges,  and  polyhedral  ranges  with  a  bounded  number  of  faces.  Although 
it  seems  at  first  that  these  different  types  of  ranges  have  little  in  common,  it  is  possible, 
for  example,  to  transform  some  types  of  range  queries  to  others.  These  transformations 
usually  involve  mapping  objects  in  the  d-dimensional  Euclidean  space  to  points  in  some 
higher  dimensional  Euclidean  space.  Some  examples  of  these  transformations  are  given  in 
[17,  35,  36].  Dobkin  &  Edelsbrunner  [17],  for  instance,  showed  that  a  variety  of  queries  in 

involving  points  intersecting  triangles,  segments  intersecting  segments,  and  polygons 
intersecting  polygons,  as  well  as  related  problems  in  E^,  can  be  reduced  to  half-space 
range  queries.  Yao  [35]  showed  that  polyhedral  range  queries  can  be  solved  by  a  sequence 
of  half-space  range  queries,  and  that  circular  region  queries  in  E^  <  a  ,>e  reduced  to  half¬ 
space  range  queries  in  E^.  Yao  &  Yao  [36]  gave  a  general  scheme  ol  v.  iucing  a  variety  of 
geometrical  queries  in  E**  to  range  queries  in  higher  dimensional  Euclidean  spaces. 

Efficient  algorithms  for  range  queries  are  of  primary  interest  in  the  field  of  compu¬ 
tational  geometry.  Besides  being  important  in  tlieir  own  right,  many  other  problems  in 
computational  geometry  either  use  range  queries  as  subroutines  or  use  generalized  \  ersions 
of  range  queries.  Examples  include  constructing  order-A:  V'oronoi  diagrams  in  the  plane 
(Lee  [27]),  finding  a  Minimal  Spanning  Tree  in  (Yao  [33]),  various  geometric  intersec¬ 
tion  problems  in  E^  and  E^  (Dobkin  Edelsbrunner  [17]),  polytope-polytopc  inieisection 
in  E'^  and  the  nearest  neighbor  problem  (Yao  k  Yao  [36]).  Some  types  of  range  queries  can 
be  themselves  reduced  to  other  problems  in  computational  geomeLiy.  Tor  example,  the 
problem  of  half-space  queries  can  be  solved  by  reducing  it  to  the  point  locat  ion  problem 
and  building  a  data  structure  for  point  location  .searches.  I'his  reduction  forms  the  basis* 
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of  some  algorithms  for  the  half-space  query  problem  discussed  in  Section  5. 

The  term  “efficient  algorithms  for  range  queries”  needs  a  little  clarification.  The  naf  ural 
complexity  parameter  of  an  algorithm  for  range  queries  is  n.  the  number  of  objects  in  the 
database.  We  generally  assume  that  the  database  of  objects  will  be  queried  many  times, 
such  that  it  is  worth  to  initially  invest  the  time  for  preprocessing  the  objects  and  to 
allocate  the  storage  for  the  data  structure.  A  solution  to  a  range  query  problem  is  usually 
evaluated  by  its  query  time,  its  storage,  and  its  preprocessing  time.  The  main  interest  is 
in  sublinear  query  time,  since  for  most  range  queries  we  can  determine  in  constant  time 
whether  an  object  falls  in  a  specified  range,  which  leads  to  a  straightforward  linear  query 
time  algorithm  (exhaustive  search).  When  discussing  the  query  time,  it  is  important  to 
distinguish  between  range  counting  queries  and  range  reporting  queries.  In  the  former 
case,  for  example,  there  are  algorithms  with  sublinear  cpiery  time,  which  is  independent 
of  the  answer  size.  In  the  latter  case,  however,  the  query  time  contains  two  components, 
the  search  time  and  the  number  of  points  reported.  The  query  time  for  range  reporting 
problems,  is  thus  of  the  form  0(/(n)  -f  k),  where  f{n),  the  search  time,  may  be  sublinear 
and  independent  of  the  specific  query,  and  k,  the  report  time,  is  dependent  on  the  specific 
query.  The  storage  requirement  is  usually  polynomial  in  n.  There  is  a  particular  class 
of  algorithms  for  range  queries,  which  we  discuss  in  Section  6,  that  achieves  the  optimal, 
linear  storage.  The  preprocessing  time  is  usually  also  polynomial  in  n,  but  is  considered 
as  less  important  measure  of  an  algorithm,  as  it  represents  only  a  one-time  effort.  Some 
relationships  between  the  query  time  and  the  storage  are  presented  in  section  7. 

In  this  paper,  we  study  the  problem  of  range  queries  in  computational  geometry.  In 
particular,  we  focus  on  half-space  and  simpliciai  range  queries,  which  constitute  a  rather 
large  class  of  range  queries.  We  describe,  compare,  and  analyze  three  general  methods  for 
processing  certain  classes  of  range  queries  in  computational  geometry.  The  three  methods 
are  random  sampling,  search-tree  tables,  and  space-partition  trees.  In  the  course  of  dis¬ 
cussing  these  three  techniques,  we  survey  and  classify  a  variety  of  results  for  range  query 
problems  in  computational  geometry.  The  different  methods  may  be  sometimes  combined 
and  can  be  related  in  some  cases.  All  the  results  presented  use  the  Real  Random  Access 
Machine  model  of  computation,  which  is  the  most  commonly  used  model  in  computational 
geometry.  This  model  is  an  extension  of  the  usual  RAM  model  with  unlimited  precision 
real  arithmetic  operations  and  storage  in  constant  time. 

The  remainder  of  this  paper  in  organized  as  follows.  Section  2  introduces  commonly 
used  geometric  notations  and  some  relevant  prol)lems  and  techniques  in  computational 
geometry.  In  Section  3,  we  define  the  notion  of  a  range  query  in  abstract  range  spaces  and 
discuss  some  properties  of  range  spaces  and  range  queries.  Section  4  presents  the  method 
of  random  sampling  for  range  queries.  In  Section  5,  we  describe  the  method  of  search- 
tree  tables  for  range  queries.  Section  6  discusses  the  method  of  space-partition  trees  for 
range  queries.  In  Section  7,  we  present  some  lower  bounds  on  the  query  time  and  storage 
requirements  of  algorithms  for  range  queries.  Finally,  Section  8  summarizes  and  compares 
the  different  methods. 
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2  Geometric  Fundamentals 


In  this  section  we  introduce  commonly  used  geometric  notations,  define  some  terms  .-.pecific 
to  our  discussion,  and  describe  some  relevant  techniques  in  computational  geometry,  'fhe 
reader  may  wish  to  only  skim  this  section  at  first,  in  order  to  become  familiar  with  the 
notations  used  in  this  paper,  and  later  refer  to  specific  terms  as  necessary. 

Spaces,  Points,  and  Point  Sets.  We  use  E'^  to  denote  the  real  d-dimensional  Euclidean 
space  with  the  standard  metric  Lj-  Points  of  are  d-dimensional  vectors  over  the  reals. 
For  a  point  set  A  C  E'^^  we  use  aff  A  to  denote  the  affine  closure  of  A,  and  we  use  dim.  A 
to  denote  the  affine  dimension  of  A.  A  k-flal  is  a  point  set  A  C  with  A  —  aff  A  and 
k  =  dim  A. 

Hyperplanes  and  Half-Spaces.  A  hyperplane  h  in  E"^  is  a  (d  —  l)-flat,  defined  by  the 
equation  OiXi  +  6  =  0  for  some  values  of  Oi,  02, .  - . ,  fld.  The  hyperplane  h  separates  E'^ 
into  a  right  open  half-space  /i'*',  defined  by  Y.iz=i  +  6  >  0  and  a  left  open  half-space  h~, 
defined  by  QiXi  +  b  <  0.  We  use  h'  to  refer  to  either  one  of  these  two  open  half-spaces, 
and  h*  to  refer  to  either  one  of  the  closed  half-spaces.  We  use  the  notation  to  denote 
the  collection  of  all  hyperplanes  in  E'^,  the  notation  to  denote  all  the  open  half-spaces 
in  E'^,  and  the  notation  to  denote  all  the  closed  half-spaces  in  E‘^. 

General  Position  Assumption.  In  computational  geometry  it  is  usually  assumed  that 
the  points  are  in  general  position.  This  means  that  all  points  are  distinct,  no  three  points 
are  collinear,  no  four  points  are  coplanar,  and  in  general,  no  -f-  1  points  lie  in  the  same 
{k—  l)-flat,  for  >  1.  This  is  not  a  restrictive  assumption,  as  small  random  perturbations 
assure  the  above  conditions  with  probability  1.  Similarly,  hyperplanes  are  assumed  to  be 
in  general  position,  that  is,  the  intersection  of  any  r  <  d  hyperplanes  in  E’^  is  a  k-flat  of 
dimension  k  =  d  —  r. 

Convex  Set,  Convex  Hull,  and  k-Set.  A  domain  A  C  E^  \s  convex  if  for  any  two 
points  x,y  e  A,  the  line  segment  between  x  and  y  is  entirely  contained  in  A.  The  convex 
hull  of  a  point  set  A  is  the  boundary  of  the  minimal  convex  domain  in  E'^  containing  A. 
A  k-set  of  a  point  set  A  is  any  subset  B  C  A  of  cardinality  |B|  =  k,  such  that  there  is  a 
hyperplane  /i,  separating  B  from  A  —  B,  that  is,  B  =  A  H  /i*. 

Polyhedral  Sets  and  Polytopes.  A  polyhedral  set  is  the  intersection  of  a  huite  number 
of  closed  half-spaces,  and  a  polytope  is  a  bounded  polyhedral  set.  A  face  of  a  polyhedral 
set  P  is  the  intersection  of  P  with  one  or  more  of  the  hyperplanes  defining  it.  Vertices, 
edges,  and  facets  of  a  polyhedral  set  of  dimension  d  are  faces  of  affine  dimensions  0.  1.  and 
d  —  1,  respectively. 

Complex.  A  complex  is  a  collection  of  polyhedral  sets,  such  that  every  face  of  a  polyhedral 
set  in  the  complex  is  also  in  the  complex,  and  the  intersection  of  two  polyhedral  sets  in 
the  complex  is  a  face  shared  by  each  of  them. 

Simplex  and  Triangulation.  A  k-simplex  in  E"^  (for  k  <  d)  is  the  convex  hull  of  A-  -t-  1 
affinely  independent  point.s.  Any  r-dimensional  face  of  a  ^--simplex  is  an  r-sirnplex.  When 
the  dimension  of  the  ^--simplex  is  not  important,  we  refer  to  it  as  just  a  simplex.  A 
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triangulation  A(C)  of  a  complex  C  is  a  refinement  of  C  into  simplicial  regions,  such  that 
each  polyhedral  set  of  dimension  k  in  A(C)  is  a  ^•-simplex. 

Arrangement  of  Hyperplanes.  For  a  collection  II  of  n  hyperplanes  in  the  arrange¬ 
ment  of  H,  A{H),  is  the  complex  generated  by  the  cells  defined  by  the  hyperplanes  of  //. 
For  representing  arrangements,  we  usuall3'^  keep  lists  of  polyhedral  regions  and  neighbor¬ 
hood  relations.  For  d  >  0  and  n  >  0,  we  define  $d(n)  to  be  J2t=o  u)  ^  2" 

otherwise.  For  a  collection  if  of  n  hyperplanes  in  general  position  in  E’^,  the  arrangement 
A{H)  has  $d(n)  cells.  Furthermore,  if  h  is  any  other  hyperplane  in  E'^,  then  the  number 
of  cells  in  the  arrangement  A{H)  intersected  by  k  is  at  most  <I>af_](n). 

Point  Hyperplane  Duality.  Points  in  can  be  transformed  to  hyperplanes  in  Hd  and 
virtvversa.  The  most  common  of  these  transformations.  Hough  transformation,  maps  a 
point  (oi,  02, . . . , a<i)  in  E'^  to  the  hyperplane  +  1  =  0  in  Hd,  and  conversely. 

The  duality  relationship  between  points  and  hyperplanes  allows  a  problem  in  the  points 
space  to  be  solved  by  first  transforming  the  points  into  the  hyperplanes  domain,  and  then 
solving  a  related  problem  in  the  dual  space.  Similarly,  in  the  other  direction,  problems  on 
hyperplanes  can  be  transformed  into  related  problems  on  points. 

Point  Location  Problem.  Given  an  underlying  partition  of  the  space  into  regions  (e.g. 
Voronoi  diagrams,  grids,  o,  arrangements  of  hyperplanes),  the  point  location  problem  is  to 
identify  the  region  in  which  a  query  point  falls. 


3  Range  Spaces  and  Range  Queries 

In  this  section  we  formally  define  the  notions  of  range  space  and  range  query  and  present 
some  properties  of  range  spaces.  We  relate  those  notions  and  properties  to  range  queries 
in  computational  geometry. 

We  first  define  the  notion  of  a  range  space.  The  following  definition  is  from  Haussler 
&  Welzl  [25]  and  is  based  on  the  pioneering  work  by  Vapnik  k  Chervonenkis  [30]. 

Definition  1  A  range  space  T  is  a  pair  (V,  7^),  where  V  is  a  set  and  7^  is  a  set  of  subsets 
of  V.  Members  of  V  are  called  elements  or  points  of  T  and  members  of  TZ  are  called  ranges 
of  T.  The  range  space  T  is  finite  if  V  is  finite. 

In  the  domain  of  computational  geometry,  we  usually  use  the  range  space  T  =  [E'^,'R.), 
where  elements  of  T  are  points  in  the  d-dimcnsional  Euclidean  space,  and  TZ  is  some  family 
of  regions  'm  E^. 

We  next  present  the  definition  of  the  Vapnik-Chervonenkis  dimension  of  an  abstract 
range  space.  Recently,  it  became  evident  that  the  notion  of  the  V-C  dimension  of  a  range 
space  is  an  important  and  useful  parameter  of  the  space. 

Definition  2  Let  T  =  (V,7?.)  be  a  range  space  and  let  A'  C  V  be  a  finite  set  of  elements 
of  T.  We  denote  by  072  (A')  the  set  of  all  subsets  of  A'^  that  can  be  obtained  by  intersecting 
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A''  with  a  range  of  T,  that  is  ,  n7;(A'^)  —  {A'^  fl  V'  ;  F  6  TZ).  If  n7;(A^)  =  2’^ ,  then  we  say 
that  X  is  shattered  by  TZ.  The  Vapnik-Chervonenkis  dimension  of  T  (or  simply  the  V-C 
dimension)  is  the  largest  integer  d  such  that  there  exist  a  subset  X  of  V  of  cardinality  d 
that  is  shattered  by  TZ.  If  no  such  maximal  d  exists,  we  say  the  dimension  of  T  is  infinite. 

Applying  Definition  2  to  range  spaces  in  computational  geometry  gives  some  interesting 
results.  For  example,  the  V-C  dimension  of  the  range  space  T]  =  consisting 

of  all  open  half-spaces  as  ranges,  \s  d  +  1.  As  another  example,  the  V-C  dimension  of 
T2  =  (E^,  C3),  consisting  of  all  triangular  regions  in  the  Euclidean  plane,  is  7.  The  latter 
example  can  be  verified  by  noticing  that  any  subset  of  7  points,  equally  spaced  on  the 
circumference  of  a  circle  in  £'^,  can  be  separated  from  the  other  points  by  drawing  an 
appropriate  triangle  that  surrounds  them.  For  8  points  in  it  is  no  longer  possible  to 
separate  any  subset  of  them  with  a  triangle. 

We  now  introduce  a  formal  definition  for  the  notion  of  a  range  query  in  range  spaces. 
This  definition  is  a  combination  of  ideas  from  Fredman  [24]  and  Ilaussler  &  Welzl  [25]. 

Definition  3  Let  T  =  (V,7?.)  be  a  range  space,  (5,-4-)  be  a  commutative  semigroup,  and 
/  be  a  function  /  ;  V  x  7?,  — >  5.  A  database  is  a  finite  set  A'  C  V.  A  range  query  Q  on 
a  database  A  C  V  and  a  range  V  G  TZ  is  defined  as  Q(A,  V)  =  where  the 

sum  is  the  semigroup  sum. 

The  interpretation  of  Definition  3  in  the  domain  of  computational  geometry  is  as  follows. 
A  database  X  is  a  finite  set  of  points  from  A  range  query  is  defined  on  a  database  A' 
and  a  range  (subset)  of  E"^.  The  general  outcome  of  a  range  query  is  considered  as  a  value 
from  some  commutative  semigroup  (5,  -f-).  The  function  /  maps  any  pair  (x,  V)  of  a  point 
and  a  range  to  some  element  in  5.  The  choice  of  the  function  /  and  the  semigroup  (5, -t-) 
depends  on  the  intended  purpose  of  the  data  structure.  Several  examples  of  selecting  / 
and  (5, +)  for  commonly  used  range  queries  are  listed  below. 

•  For  counting  the  number  of  elements  of  A'  in  the  range  F,  we  define'  /(.r,  V)  to  be  1 
if  X  is  in  range  F  and  0  otherwise.  We  choose  (5, -h)  to  be  (Z,-l-),  the  integers  with 
the  usual  addition  operation 

•  For  reporting  the  elements  of  X  in  the  range  F,  we  define  /(x,  F)  to  be  (x)  if  x 
is  in  range  F  and  0  otherwise.  We  choose  (5, -h)  to  be  (2^  ,  U),  the  collection  of  all 
subsets  of  A'^  with  the  set  union  operation. 

•  For  checking  whether  there  exists  an  element  of  A'”  in  the  range  V,  we  define  /(x,  F) 
to  be  1  if  X  is  in  range  F  and  0  otherwise.  We  choose  (5,  +)  to  be  ({0,  1}  .  V). 

•  For  checking  whether  all  the  elements  of  ,V  are  in  the  range  F,  we  define  /(x,  F)  to 
be  1  if  X  is  in  range  and  0  otherwise.  We  choose  (5,  -f )  to  be  ({0,  1 }  ,  A). 

•  For  computing  the  minimal  value  of  some  real  function  7(x,F)  over  all  <'l('ments  of 
A^  and  the  range  Y,  we  define  f{x,Y)  ~  g{  r,  F).  W«'  choosi'  (5,  -f )  to  lx-  (R.min), 
the  reals  with  the  minimum  0])eration. 
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The  definition  of  range  queries  is  rather  general;  all  the  types  of  range  queries  mentioned 
in  the  introduction  are  special  cases  of  Definition  3.  For  all  these  range  queries  we  use 
the  range  space  T  =  mentioned  above.  The  ranges  7?.  of  T  can  be  chosen  as 

half-spaces  defined  by  hyperplanes  in  E'^,  hyperrectangiilar  (orthogonal)  regions  defined 
by  2d  real  numbers,  hyperspherical  regions  defined  by  their  renter  point  in  E'^  and  their 
radius,  simplicial  regions  defined  by  d -|-  1  points  in  E'^,  and  polyhedral  regions  defined  by 
a  finite  collection  of  hyperplanes  in  E'^. 

In  the  rest  of  the  paper,  we  restrict  ourselves  to  half-space  and  simplex  queries,  and 
comment  on  how  some  of  the  results  can  be  extended  to  other  ranges.  As  was  mentioned 
in  the  introduction,  some  range  queries  can  be  reduced  to  half-space  queries,  and  thus 
half-space  queries  constitute  a  rather  general  class  of  range  queries. 


4  Random  Sampling 


In  this  section,  we  discuss  the  technique  of  randomly  sampling  the  database  of  points  to 
approximate  the  number  of  points  in  a  query  range.  The  usefulness  of  random  sampling 
methods  in  computational  geometry  has  been  recently  demonstrated  by  Clarkson  for  a 
variety  of  problems  (see  [11,  12,  13,  14]).  Random  sampling  for  range  queries  was  also 
used  by  Ilaussler  &  Welzl  in  [25]. 

The  main  idea  behind  the  use  of  random  sampling  is  that  a  sample  may  give  useful 
approximate  information  about  the  sampled  set.  For  example,  consider  the  half-space 
counting  problem  on  a  database  X  of  n  points  in  E'^.  When  a  query  half-space  h'  from 
is  introduced,  we  can  pick  a  sample  Z  C  X  consisting  of  m  independently  drawn  points 
from  the  n  database  points.  The  idea  is  that  the  fraction  of  the  m  sample  points  that  fall 
in  the  query  range  h*  is  a  good  approximation  for  the  fraction  of  the  points  of  X  that  lie 
in  the  half-space  h* .  Of  course,  the  quality  of  the  approximation  improves  as  the  size  of 
the  sample  set  Z  increases. 

To  make  the  analysis  simpler,  we  assume  that  m  <<  n,  that  is,  we  pick  a  small  fraction 
of  the  points  of  X.  Suppose  that  r  out  of  the  n  points  of  lie  in  the  range  /»*,  and  define 
p  =  r(n.  The  probability  that  a  randomly  picked  point  from  X  falls  in  the  range  h* 
is  thus  p.  We  define  the  random  variable  W  to  be  the  number  of  points  of  the  random 
sample  Z  that  fall  in  the  range  h*.  Since  the  m  sample  points  are  picked  independently, 
and  because  we  assume  that  m  <<  n,  the  random  variable  IF  has  a  binomial  distribution 
with  probability  p.  The  expected  value  of  W  is  p  =  mp  —  mrin  and  the  variance  is 
<7^  =  mp(l  —  p)  —  mr{n  —  r)/n^.  Chebychev’s  inequality  then  gives  us 
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We  can  now  get  a  bound  for  the  probability  that  the  fraction  W/m  is  an  estimate  of  r/n. 
to  within  an  accuracy  of  t 


<4>1- 

I.  m  n  j 


4f^m 


The  interpretation  of  this  inequality  is  that  for  a  sample  of  size  m  >  the  fraction 

W/m  approximates  the  fraction  r/n  to  within  an  accuracy  of  e,  with  probability  at  least 
1  — Notice  that  the  analysis  is  not  limited  to  half-space  counting  queries;  it  applies  to  any 
range  counting  query  in  any  range  space,  provided  that  we  pick  the  random  sample  anew 
with  every  query.  This  discussion  and  analysis  give  rise  to  a  data  structure  for  approximate 
range  counting  that  achieves  0(1)  query  time  (actually  0(m)  query  time,  but  we  assume 
that  m  is  a  constant),  and  uses  0(n)  storage  to  store  the  n  database  points.  There  is  no 
preprocessing  required,  and  thus  the  preprocessing  time  for  such  a  data  structure  is  also 
0(1).  The  main  drawback  of  this  data  structure  is  that  we  have  to  pick  a  new  random 
sample  for  each  query.  We  would  prefer  to  have  the  random  sample  picked  once,  and  use 
the  same  sample  to  approximate  successive  range  queries. 

The  problem  of  using  one  random  sample  for  approximating  several  range  queries  has 
been  studied  in  Statistics.  Vapnik  &  Chervonenkis  in  [30]  derived  general  conditions, 
under  which  several  probabilities  can  be  uniformly  estimated  using  one  random  sample. 
The  following  definition  of  an  e-approximation  was  introduced  by  Vapnik  Sz  Chervonenkis. 


Definition  4  Let  T  =  (V,  7?.)  be  a  range  space  and  let  X  be  a  database.  For  any  0  <  e  <  1 
and  Z  C  X,  we  say  that  Z  is  an  e- approximation  of  X  (for  7?.),  if  for  all  T  €  7^  we  have 


|xny|  \zr)Y\ 
|a:|  \z\ 


Vapnik  &  Chervonenkis  also  showed  that  e-approximations  can  be  constructed  using 
random  samples.  They  provided  an  upper  bound  on  the  size  of  a  random  sample  needed  to 
construct  an  e-approximation  of  database  X  with  probability  at  least  1  —  <5,  for  any  range 
space  with  V-C  dimension  d.  More  precisely,  they  proved  that  if  T  =  {Y,TZ)  is  a  range 
space  of  V-C  dimension  d,  X  is  a  database,  and  e  and  6  are  real  numbers,  0  <  e,d  <  1,  then 
a  random  sample  Z  of  X  of  size  m  will  be  an  c-approximation  of  X’  for  TZ  with  probability 
at  least  1—6,  provided 


for  some  positive  real  constant  c. 

From  the  result  of  Vapnik  &  Chervonenkis,  it  follows  that  for  any  database  .V,  there  is 
an  e-approximation  Z  of  size  at  most  (c/e^)(dlg(d/c)) -|- 1,  which  is  independent  of  the  size 
of  X.  For  example,  if  X  is  a  set  of  points  in  then  there  exists  a  0.01-approximation  Z  for 
half-plane  range  queries,  such  that  |Z|  =  2,525,039.  Similarly,  there  are  e-approximations 
of  0(1)  size  for  all  the  range  queries  discussed  in  the  introduction. 
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These  ideas  can  be  used  to  construct  a  data  structure  for  range  queries,  that  achieves 
0(1)  query  time  and  uses  only  0(1)  storage.  Furthermore,  the  preprocessing,  which  in¬ 
volves  obtaining  an  e-approximation  by  randomly  sampling  the  database,  can  also  be  done 
in  0(1)  expected  time.  This  data  structure,  however,  can  only  support  approximate  so¬ 
lutions  for  range  queries.  In  addition,  it  can  only  be  used  for  range  counting  queries  and 
not  for  other  kinds  of  queries  (like  reporting  or  minimizing).  P'inally,  although  the  sample 
is  of  constant  size,  for  most  range  spaces,  the  constants  are  quite  large. 

The  technique  of  random  sampling  can  be  also  employed  to  obtain  data  structures 
that  enable  exact  solutions  to  range  queries.  This  was  demonstrated  by  Clarkson  [12]. 
who  used  random  sampling  to  construct  a  search-tree  based  data  structure  for  the  point 
location  problem  in  Another  use  of  random  sampling  was  later  demonstrated  by 

Haussler  &  Welzl  in  [25],  to  construct  a  space-partition- tree  based  data  structure  for  half¬ 
space  and  simplicial  range  queries.  Their  scheme,  which  we  describe  in  Section  6,  uses  the 
notion  of  an  c-net,  which  is  related  to  the  notion  of  an  c-approximation  of  [30]. 

Definition  5  Let  T  =  (V,7^)  be  a  range  space,  a  database,  and  0  <  e  <  1.  Th<*n  7Zx.( 
denotes  the  set  of  a.\\  Y  E  E.  that  contain  strictly  more  than  e  l-Y]  points  from  A  ,  that  is, 
|A^  n  yj  /  lA'^l  >  e.  A  subset  Z  of  A"^  is  an  t-nei  of  A^  (for  7?.)  if  Z  contains  a  point  in  each 

y  e  Tlx,. 

It  is  easily  seen  that  every  e-approximation  of  A'  is  also  an  e-net  of  A^.  The  conv'erse, 
however,  is  not  true.  In  general,  the  size  of  an  e-net  can  be  much  smaller  then  the  size  of 
an  e-approximation.  Haussler  h  Welzl  showed  that  e-nets  can  also  be  constructed  using 
random  sampling.  They  provided  an  upper  bound  on  the  size  of  a  random  sample  needed 
to  construct  an  e-net  of  database  .V  with  probability  at  least  1  —  <5,  for  any  range  space 
with  V-C  dimension  d.  The  random  sample  Z  should  be  formed  by  m  independent  draws 
from  X,  where 


From  this  discussion  it  follows  that  for  any  database  A',  there  is  an  e-net  Z  of  size  at 
most  (8d/e)  lg(8d/e).  Notice  that  an  e-net  of  X  is  required  to  be  a  subset  of  .Y.  Thus,  for 
example,  for  any  database  A'^  in  E’^  and  for  half-space's  as  ranges,  any  0-net  must  contain 
all  the  extreme  points  of  X.  However,  if  we  lift  the  requirement  that  the  points  of  the 
e-net  are  a  sub.set  of  A',  there  always  exist  e/-f-  1  points  in  E'^  that  contain  A'  in  the  convex 
domain  defined  by  them.  These  points,  then,  would  constitute  a  0-net  of  A'  of  size  d  -|-  I. 


5  Search- Tree  Tables 

In  this  section,  we  describe  the  method  of  search-tree  tables  for  range  queries  and  demon¬ 
strate  its  use.  A  search-tree  table  is  the  underlying  data  structure  of  many  algorithms 
that  achieve  polylogarithmic  search  time.  'I’his  search  time  usually  comes  at  the  expemse 
of  polynomially  large  storage. 
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The  main  idea  behind  the  search-tree  tables  method  is  that  for  a  finite  set  of  points 
(a  database)  and  for  certain  types  of  range  queries,  there  are  only  polynomially  many 
possible  solutions.  Therefore,  for  certain  kinds  of  range  queries  on  a  given  databa.se, 
one  can  initially  store  all  the  possiole  solutions  in  a  table,  and  when  a  query  range  is 
introduced,  the  appropriate  entry  in  the  table  can  be  located.  The  organization  of  these 
tables  is  usually  based  on  search  trees,  which  enables  achieving  polylogarithmic  search 
time.  The  storage  required  by  these  data  structures  is  polynominal  in  n,  the  size  of  the 
database.  The  preprocessing  time  is  usually  also  polynomial  in  n. 

In  order  to  reduce  the  amount  of  storage,  some  algorithms,  based  on  search-tree  tables, 
do  not  store  entire  solutions  in  table  entries.  Rather,  table  entries  contain  partial  informa¬ 
tion  about  the  solution  and  pointers  for  further  .search  in  the  table.  These  data  structures, 
achieving  smaller  storage,  are  more  appropriate  for  range  reporting  queries  than  for  range 
counting  queries.  Recall  that  for  range  reporting  queries,  the  search  time  is  only  one  com¬ 
ponent  of  the  query  time,  with  the  other  component  being  the  report  time.  The  data 
structures  described  in  this  section  will  be  analyzed  in  terms  of  their  search  time,  ignoring 
the  report  time  component.  Recenviy,  Chazelle  [6]  proposed  a  technique  called  filtering 
search,  that  reduces  the  total  query  time  of  certain  range  reporting  queries,  by  attempting 
to  match  the  search  time  and  report  time  components  of  the  query  time. 

The  simplest  example  of  search-tree  tables  is  probably  1-dimensional  binary  search 
trees.  The  typical  range  query  in  1  dimension  is  to  determine  the  keys  in  some  interval. 
Balanced  binary  search  trees  achieve  logarithmic  .search  time  and  require  linear  storage. 
Binary  search  trees  do  not  contain  entire  solutions  to  range  queries  in  their  leaves,  but 
still  enable  retrieving  intervals.  Threaded  binary  search  trees  are  a  more  typical  example 
of  a  data  structure  that  stores  partial  solutions  and  pointers  in  leaves.  There  are  several 
generalizations  of  the  1-dimensional  search  trees  to  higher  dimensions.  Multidimensional 
search  trees  were  investigated  mainly  in  the  context  of  database  queries  (see  [2,  3,  4,  19, 
28,  29]).  These  data  structures  support  orthogonal  range  queries  in  d  dimensions.  The 
storage  is  typically  0{n  Ig"^”*  n)  with  a  search  time  of  0(lg‘^  n).  Notice,  however,  that  these 
multidimensional  search  trees  are  dynamic,  that  is,  they  support  insertions  and  deletions 
as  well  as  orthogonal  range  queries. 

The  more  general  problem  of  half-space  range  queries  has  a  natural  representation  as  a 
search  problem.  There  is  a  duality  relationship  between  half-space  range  queries  and  point 
location  problems.  Using  the  Hough  transformation  (see  .Section  2),  a  set  X  of  n  points 
in  E'^  is  mapped  to  a  collection  Hx  o{  n  hyperplanes  from  //j.  The  problem  of  reporting 
the  points  of  that  lie  above  some  query  hyperplane  h  transforms  to  the  problem  of 
reporting  the  hyp>erplanes  of  Hx  that  lie  above  the  dual  point  ph  of  h.  If  the  arrangement 
A{Hx)  of  the  hyperplanes  Hx  is  given,  then  the  latter  problem  can  be  solved  by  locating 
the  cell  in  the  arrangement  A{Hx)  containing  p/,,  thereby  reducing  the  half-space  range 
quer)’  problem  to  the  point  location  problem. 

The  above  reduction  motivates  a  data  structure  for  half-space  range  queries  that  rep¬ 
resents  the  n  database  points  by  storing  the  arrangement  of  the  dual  n  hyperplanes.  The 
preprocessing  involves  computing  the  arrangement  of  the  n  dual  hyperplanes  and  storing 
the  solutions  at  the  arrangement’s  cells.  Each  cell  can  contain  a  list  (or  a  count)  of  the 
hyperplanes  above  and  below  it.  This  scheme  requires  0(7U‘'‘‘)  storage  for  a  data  strucliire 
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for  half-space  range  queries  in  since  there  are  0{n‘^)  cells  in  a  general  arrangement  of  n 
hyperplanes  in  and  each  cell  stores  0{n)  information.  An  alternative  data  structure  c  an 
be  designed,  where  each  cell  only  contains  the  “name”  of  a  hyperplane  bounding  it  from 
above  (defining  a  facet  of  the  cell),  and  a  pointer  to  the  cell  on  the  other  side  of  that  facet. 
In  this  data  structure  it  is  required  to  first  locate  the  cell  containing  the  query  point,  and 
then  to  traverse  a  list  of  cells  in  the  arrangement  and  report  the  appropriate  hyperplanes. 
The  search  time  accounts  only  for  the  time  it  takes  to  locate  the  first  cell,  while  the  time 
it  takes  to  traverse  the  list  is  considered  part  of  the  report  time.  This  scheme  reduces  the 
'  storage  requirement  to  0{n^). 

I  This  discussion  shows  that  efficient  solutions  for  the  point  location  problem  in  also 
j  yield  efficient  solutions  for  half-space  queries.  In  2  dimensions,  the  point  location  problem 
I  can  be  solved  in  O(lgn)  time  by  searching  a  planar  subdivision  tree  (Kirkpatrick  [26]). 
This  forms  the  basis  of  the  O(n^)-storage,  0(lgn)-search-time  data  structure  for  half¬ 
plane  queries  described  by  Edelsbrunner  h  Kirkpatrick  [20].  This  scheme  can  be  modified 
as  described  above  to  use  only  0{n^)  storage.  However,  an  optimal  C)(n)-storage,  O(lgn)- 
search-time  data  structure  for  the  half-plane  reporting  problem  was  obtained  by  Chazelle, 
Guibzis,  &  Lee  [9].  Their  method,  however,  only  solves  the  half-plane  reporting  problem 
and  is  not  extendible  to  other  ranges  in  the  plane. 

A  variety  of  other  data  structures  for  range  queries,  based  on  search-tree  tables,  ap¬ 
pear  in  the  literature.  An  O(n(lgn)®(lglgn)‘’)-storage,  0(lgn)-search-time  data  structure 
for  the  half-space  reporting  problem  in  E^  was  presented  by  Chazelle  &  Preparata  in  [10]. 
Their  method  again  reduces  the  half-space  range  problem  to  a  3-dimensional  point  location 
problem.  The  small  storage  in  their  scheme  is  achieved  by  bounding  the  number  of  k-sets 
of  an  arbitrary  point  set  in  E^.  Improving  the  bound  on  the  number  of  A;-sets  in  E^,  Clark¬ 
son  [14]  was  able  to  improve  the  storage  requirement  for  the  half-space  reporting  problem 
in  E^  to  0(n(lgn)^lglgn),  while  retaining  the  O(lgn)  search  time.  In  [20]  Edelsbrun¬ 
ner,  Kirkpatrick,  and  Maurer  demonstrated  an  0(n’^)-storage,  0(lg  n)-search-time  data 
structure  for  triangular  range  queries  in  the  plane. 

A  general  data  structure  for  the  half-space  reporting  problem  in  d-dimensions,  based 
on  search-tree  tables,  was  given  by  Cole  &  Yap  [16].  We  illustrate  their  technique  in  E^, 
and  comment  about  its  generalization  to  E"^.  Given  a  database  X  of  n  points  in  the  plane, 
we  draw  a  vertical  line  L  to  the  right  of  the  points  in  X.  For  any  pair  of  points  p,  q  from 
X,  we  compute  the  intersection  point  of  the  line  through  p  and  q  with  the  base  line  L. 
These  intersection  points  divide  L  into  ^2)  +  1  intervals  with  the  following  property. 
I  Consider  rotating  a  test  line  P  from  slope  —00  to  slope  00  about  some  fixed  point  r  on 
L.  This  test  line  P  mee^  the  points  of  X  in  some  order,  which  we  associate  with  the 
intersection  point  r.  The.  the  orders  of  the  points  of  X  associated  with  any  two  points 
in  a  given  interval  on  '  the  same.  This  leads  to  the  following  data  structure.  With 
each  of  the  -|- 1  intervals  vyr  L,  we  store  the  order  of  the  points  of  X  as  seen  from  that 
interval.  These  inter  ^als  are  organized  in  a  search  tree,  such  that  for  any  line  h  in 

£'^,  the  interval  in  which  h  intersects  L  can  be  found  in  O(lgn)  time.  When  given  a  query 
half-plane  /«*,  we  find  the  intersection  interval  of  h  with  L  and  scan  the  list  of  points  of  X, 
stored  with  that  interval,  until  we  meet  the  first  point  of  A'  that  lies  below  h.  This  gives 
an  (9  (n^) -storage,  0(lg  n)-search-time  data  structure  for  the  half-plane  reporting  problem. 
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(Actually,  the  storage  can  be  reduced  to  0(n^).)  This  data  structure  can  be  generalized 
to  handle  half-space  reporting  problems  for  E‘^  in  0(2*^ Ign)  search  time  and  0(dn^‘‘~‘ ) 
storage. 

An  improved  data  structure  for  half-space  range  queries,  based  on  search-tree  tables, 
is  obtained  from  a  new  scheme  of  searching  arrangements  of  hyperplanes  in  presented 
by  Clarkson  [12].  The  data  structure  for  the  point  location  problem  in  an  arrangement  of 
hyperplanes  is  constructed  using  random  sampling.  The  resulting  algorithm  achieves  an 
O(lgn)  query  search  time,  and  uses  storage.  The  probabilistic  construction  of  the 

data  structure  takes  expected  time.  We  next  describe  the  essential  ideas  of  the 

arrangement-searching  algorithm. 

In  order  to  search  the  arrangement  A(IJ)  of  a  collection  H  oi  n  hyperplanes,  we  use  the 
triangulation  A{A{H)),  which  is  a  recursive  refinement  of  A{H)  into  simplicial  regions. 
Rather  then  finding  a  cell  C  of  A{H)  containing  the  point,  we  find  a  simplex  S  which  is  a 
refinement  of  C.  We  search  A{A(H))  by  first  searching  A{A{J)),  the  triangulation  of  the 
arrangement  of  a  small  subset  J  of  H.  After  locating  the  simplex  S  of  A(^(  J))  containing 
the  query  point,  we  find  the  subset  H*  of  the  hyperplanes  of  H  that  intersect  the  simplex  S, 
and  continue  the  search  with  these  hyperplanes.  (This  is  a  generalization  of  the  scheme  for 
point  location  in  the  plane  presented  by  Kirkpatrick  [26].)  The  construction  of  Clarkson’s 
search  tree  is  probabilistic.  At  each  level  of  the  tree,  we  pick  a  random  sample  of  the 
current  set  of  hyperplanes.  For  each  simplex  in  the  triangulation,  we  determine  the  set  of 
hyperplanes  intersecting  it,  and  recursively  build  a  search  tree  for  that  simplex  with  these 
hyperplanes.  Clarkson  showed  that  samples  that  generate  a  tree  of  logarithmic  depth  can 
be  each  found  in  0(1)  expected  time. 

A  deterministic  construction  of  the  search- tree-table  based  data  structure  of  Clarkson 
for  the  point  location  problem  was  recently  presented  by  Chazelle  &  Friedman  [8].  This 
data  structure  achieves  an  0(lg  n)  search  time,  uses  0{n^)  storage,  and  can  be  deterministi¬ 
cally  constructed  in  J  \gn)  time.  Recently,  a  new  algorithm  for  the  half-space 

reporting  problem  in  E’^  was  discovered  by  Clarkson  [14].  This  algorithm  employs  another 
data  structure,  based  on  search-tree  tables,  that  achieves  an  O(lgn)  search  time,  and  uses 
0(nl‘^'^^-l'^*)  expected  storage. 


6  Space-Partition  Trees 

In  this  section,  we  describe  the  method  of  space-partition  trees  for  range  queries  and 
demonstrate  its  use.  A  space-partition  tree  is  the  underlying  data  structure  of  many 
algorithms  that  use  linear  storage  and  achieve  sublinear  query  time.  The  sublinear  query 
time  is  achieved  by  using  divide  and  conquer  to  search  only  part  of  the  space-partition 
tree. 

Space-partition  trees  can  be  informally  described  as  follows.  Given  a  database  .Y  of  tlie 
range  space  T  =  (V,  72),  wc  construct  a  rooted  tree  P  with  jA'I  leaves  and  with  a  bounded 
number  of  children  per  node.  Each  node  p  of  the  tree  P  represents  some  region  r(g{p)  of 
the  space.  The  root  represents  the  whole  space  and  the  region  represented  by  each  internal 
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node  in  the  tree  is  the  disjoint  union  of  the  regions  represented  by  its  children.  Thus,  at 
every  level  of  the  tree  P,  the  disjoint  union  of  the  regions  represented  by  all  the  nodes  at 
that  level  is  the  whole  space.  With  every  node  p  of  P,  we  associate  a  set  sei{p)  of  points 
of  X.  This  set  is  simply  the  set  of  points  of  X  that  fall  in  the  region  represented  by  p 
(that  is,  set{p)  =  X  (1  reg{p)).  This  guarantees  that  for  an  internal  node  p  with  children 
Pi,P2, . . .  ,p;i,  we  have  set{p)  =  set{pi)  Use<(p2)U  •  •  -  Uset{pk),  where  the  union  is  disjoint. 
This  recursive  partitioning  the  space  is  terminated  when  the  set  associated  with  a  leaf  p 
j  is  a  singleton,  that  is,  set[p)  =  {x},  for  some  x  6  X.  In  other  words,  the  bottom  level 
'  regions  each  contain  a  single  element  of  X. 

(Space-partition  trees  can  be  used  for  solving  range  counting  problems  on  a  range  space 
T  =  (V,7^).  Suppose  that  jP  is  a  space-partition  tree  for  a  database  X  of  T.  With  any 
node  p  of  P,  we  store  the  number  of  points  in  set{p).  To  solve  a  range  counting  query  for 
a  range  y  6  7?.,  we  search  P  using  the  following  divide  and  conquer  scheme,  starting  at 
the  root. 

1.  If  reg{p)  C  Y,  add  |set(p)|  to  the  count  and  do  not  continue  with  p’s  children. 

2.  If  reg{p)  D  T  =  0,  do  not  change  the  count  and  do  not  continue  with  p’s  children. 

3.  If  reg{p)  crosses  Y,  that  is,  neither  reg{p)  C  Y  nor  reg[p)r\Y  —  0,  then  continue  the 
query  recursively  with  p’s  children. 

The  usefulness  of  space-partition  trees  for  range  queries  depends  on  a  number  of  factors. 
First,  we  need  to  guarantee  that  a  space-partition  tree  for  range  counting  queries  requires 
only  linear  storage,  by  storing  only  0(1)  information  per  node  and  having  the  number  of 
nodes  be  linear  in  n.  In  particular,  we  do  not  need  to  store  set{p)  itself  at  node  p.  Second, 
note  that  the  query  time  depends  on  the  number  of  nodes  visited  in  the  search;  thus,  to 
achieve  sublinear  query  time  for  a  range  Y,  we  need  to  guarantee  that  only  a  fraction  of 
the  children  of  each  node  p  are  explored,  that  is,  that  the  range  Y  crosses  only  a  fraction 
of  the  regions  represented  by  p’s  children.  This  property,  however,  does  not  have  to  hold 
for  every  single  node  of  P]  it  is  enough  to  guarantee  it  for  all  nodes  of  depth  greater  than 
or  equal  to  some  constant  r.  Third,  for  a  node  p  and  a  range  Y,  we  need  to  be  able  to 
determine  quickly  (in  0(1)  time)  whether  reg{p)  is  a  subset  of  K,  whether  reg{p)  and  Y 
are  disjoint,  or  whether  Y  crosses  reg{p). 

The  issues  discussed  above  restrict  the  types  of  ranges  for  which  range  queries  can 
be  solved  efficiently  using  space-partition  trees.  We  described  above  how  to  solve  range 
counting  queries  using  a  constant  amount  of  information  per  node.  Range  reporting  queries 
can  be  solved  similarly  by  recursively  visiting  all  the  descendants  of  a  node  p  for  which 
reg{p)  is  fully  contained  in  the  query  range  Y.  The  query  time,  however,  for  range  reporting 
queries  has  the  additional  term  of  k,  the  number  of  points  reported.  The  other  variants  of 
range  queries  mentioned  in  Section  3  can  be  solved  similarly.  We  also  observe  that  space- 
partition  trees  can  only  be  used  for  range  spaces  of  finite  V-C  dimension.  In  range  spaces 
with  infinite  V-C  dimension,  there  are  arbitrarily  large  finite  sets  of  points  (databases) 
that  do  not  have  good  space-partition  trees.  This  observation  is  based  on  the  fact  that 
in  range  spaces  with  infinite  V-C  dimension  there  is  no  recursive  partition  of  the  space 
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that  guarantees  that  arbitrary  ranges  cross  only  a  small  fraction  of  the  partition’s  regions. 
Welzl  [31]  recently  proved  that  the  converse  also  holds,  specifically,  that  range  spaces  with 
finite  V-C  dimension  have  good  partition  trees.  Finally,  we  comment  that  the  problem  of 
determining  the  relationship  between  an  arbitrary  range  Y  and  the  region  represented  by 
a  node  p  can  be  quite  difficult  in  general. 

We  now  present  a  variety  of  space-partition  trees  that  were  developed  for  answering 
certain  kinds  of  range  queries.  The  simplest  space-partition  trees  for  range  queries  are 
probably  binary  search  trees.  In  1  dimension,  half-spaces,  orthogonal  ranges,  circular 
ranges,  simplex,  and  polyhedral  range,  all  reduce  to  intervals.  Queries  of  this  type  can  be 
answered  by  exploring  at  most  1  of  the  2  children  of  every  node,  leading  to  an  0(lg  n)  query 
time  with  linear  storage,  achieved  by  a  number  of  balanced  binary  search  trees  schemes. 

One  generalization  of  binary  search  trees  to  d  >  2  dimensions  are  quad  trees  (see  [5,  22, 
28]).  These  trees  recursively  partition  a  region  in  E'^  into  2*^  subregions  by  bisecting  the 
region  along  the  d  respective  dimensions.  It  is  important  to  note  that  quad  trees  do  not 
necessarily  give  good  space- partition  trees  for  an  arbitrary  point  set  X  in  E’^.  Whenever 
good  space-partition  quad  trees  exist,  however,  they  are  quite  efficient  for  orthogonal 
range  queries,  achieving  query  time  with  linear  storage.  The  0{n  *  query 

time  follows  from  the  observation  that  orthogonal  ranges  cross  at  most  of  the  2“^ 
subregions  represented  by  a  quad- tree  node. 

The  first  construction  of  space-partition  trees  for  half-plane  and  polygonal  range  queries 
in  E^  was  presented  by  Willard  [32].  Willard  described  a  J -way  polygon  tree,  which  is  a 
recursive  partition  of  the  plane  by  J  lines  into  2J  polygonal  regions,  each  containing 
approximately  n/(2J)  of  the  n  database  points.  Willard’s  polygon  tree  has  the  property 
that  any  line  in  the  plane  intersects  (crosses)  at  most  J  + 1  of  the  2J  subregions  represented 
by  any  node  in  the  space-partition  tree.  In  addition,  for  any  polygon  in  the  plane,  there 
is  some  depth  r  of  the  tree,  such  that  the  polygon  intersects  at  most  d  -f  1  of  the  2J 
subregions  represented  by  any  node  at  depth  greater  than  or  equal  to  r.  These  properties 
lead  to  an  query  time  with  linear  storage  for  half-plane  and  polygonal  range 

queries.  This  query  time  is  minimized  for  J  =  3,  for  which  the  3-way  polygon  tree  achieves 
0(n*®«‘‘)  =  0(n°-^^‘‘)  query  time.  The  preprocessing  time  for  constructing  a  3-way  polygon 
tree  is  0{n^). 

Extending  Willard’s  ideas  for  the  3-dimcnsional  case,  Yao  [35]  showed  that  a  space- 
partition  tree  can  be  constructed  for  any  point  sot  in  E^.  Yao  demonstrated  an  octant-tra- 
for  half-space  and  polyhedral  range  queries  that  u.ses  linear  storage  and  has  an  0(n°^*) 
query  time.  The  octant-tree  is  a  recursive  partition  of  a  region  in  E^  by  3  planes  into 
8  subregions  of  dimension  3  and  a  few  other  subregions  of  lower  dimension.  Each  of 
the  8  subregions  of  dimension  3  is  guaranteed  to  contain  at  least  1/24  of  the  points  of 
the  parent  region.  Any  plane  in  E^  intersects  at  most  7  of  the  8  subregion®  represented 
by  a  node  in  the  octant-tree.  When  the  lower  dimensional  subregions  are  also  taken 
into  account,  the  query  time  is  shown  to  be  0(n°-®*).  Following  Yao’s  result,  Dobkin, 
Edelsbrunner,  and  Yao  [18]  showed  that  using  the  same  ideas,  the  query  time  can  be 
improved  to  C>(n*“*»'^)  =  The  preprocessing  time  of  Yao’s  approach  was  0{n‘*), 

which  was  later  improved  to  0(n^lg‘°n)  by  Cole,  Sharir,  and  Yap  [15]. 
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The  first  general  scheme  of  space-partition  trees  for  any  dimensional  Euclidean  space 
E'^  was  presented  by  Yao  &  Yao  in  [36].  They  treated  a  variety  of  range  queries  and  some 
other  optimization  queries  under  one  class  of  geometric  generic  queries,  and  demonstrate  a 
general  scheme  of  solving  generic  queries  in  linear  storage  anti  sublinear  query  time.  They 
showed  that  any  finite  point  set  X  in  E^  has  a  regular  space-partition  tree  P  of  degree  2“^ 
with  the  following  property.  For  any  internal  node  p  of  P  and  any  child  q  of  p  we  have 
|se^(9)|  <  (1/2'^)  |sef(p)l,  and  any  hyperplane  h  in  E'^  intersects  at  most  2‘^  —  1  subregions 
of  children  of  p.  This  leads  to  a  linear  storage,  sublinear  query-time  scheme  for  half-space 
range  queries  in  any  dimensional  Euclidean  space.  The  query  time  of  Yao  &;  Yao’s  scheme 
is  for  half-space  and  polyhedral  range  queries  in  E’^. 

Unfortunately,  Yao  &  Yao’s  scheme  is  not  constructive.  But,  although  Yao  &  Yao  do 
not  provide  a  practical  scheme  for  constructing  the  balanced  space-partition  tree  in 
their  result  is  of  primary  importance,  as  it  demonstrates  that  such  balanced  space-partition 
trees  exist  for  any  finite  point  set  in  E’^. 

The  best  known  query-time  bounds  for  linear-storage  data  structures  for  half-space  and 
simplicial  range  counting  queries  in  E",  for  d  >  2,  were  presented  by  Haussler  &  Welzi 
[25].  (A  better  query-time  for  was  recently  given  by  Welzl  [31].)  Their  construction  of 
space-partition  trees  uses  random  sampling  and  is  based  on  the  notion  of  c-nets  introduced 
in  Section  3.  Their  scheme  achieves  0{n°‘)  query  time,  for  a  >  The  construction 

of  the  space-partition  tree  is  probabilistic  and  takes  O(nlgn)  expected  time. 

The  partition  trees  of  Haussler  &  Welzl,  called  (e,  u)-partition  trees,  are  a  variant  of 
the  space-partition  trees  described  above.  The  main  difference  is  that  in  (e,t;)-partition 
trees,  the  region  represented  by  an  internal  node  p  is  not  divided  into  subregions  containing 
approximately  the  same  number  of  points.  Rather,  the  subdivision  of  the  region  reg(p) 
in  (e,  v)-partition  trees  is  determined  as  follows.  A  subset  points(p)  of  v  >  d  points  from 
set(p)  is  selected,  and  the  0(v'^)  hyperplanes,  each  passing  through  exactly  d  of  these  v 
points,  are  computed.  The  arrangement  of  these  O(v^)  hyperplanes  is  determined  and 
stored  at  node  p  as  arr{p).  For  each  cell  /  in  arr(p)  with  f  Dset{p)  ^  0,  there  is  a  child  pj 
with  reg{pf)  =  f  C\  reg{p).  When  the  number  of  points  in  sef(p)  is  at  most  u,  the  recursive 
refinement  terminates,  that  is,  p  is  a  leaf  node. 

In  an  (e,u)-partition  tree,  the  subdivision  of  the  region  represented  by  an  internal  node 
p  depends  on  the  selection  of  the  v  points  from  set(p).  A  desired  subdivision  has  the 
following  property;  for  any  hyperplane  h  in  the  total  number  of  points  of  set{p)  in  all 
cells  of  arr{p)  intersected  by  h,  is  at  most  e  |set{p)|.  This  condition  is  immediately  satisfied 
for  c  =  1,  but  we  are  interested  in  smaller  values  of  c.  Intuitively,  the  smaller  the  value  of 
e  is,  the  smaller  the  number  of  children  of  an  internal  node  p  that  need  be  explored.  The 
existence  of  (e,  u)-partition  trees  for  any  value  of  0  <  c  <  1  is  based  on  the  existence  of 
€-nets  for  some  specific  ranges  (namely,  {d  -f-  l)-corridors)  of  X. 

The  bounds  of  [25]  are  not  optimal.  Recently,  it  was  shown  by  Welzl  [31]  that  half¬ 
plane  and  triangular  range  queries  in  E^  can  be  solved  with  linear  storage  and  0{y/n\g^  n) 
query  time.  Welzl’s  algorithm  also  employs  a  space-partition  tree  based  on  e-nets.  These 
space-time  bounds  for  triangular  ranges  are  optimal  up  to  a  polylog  factor  (see  Section  7 
for  corresponding  lower  bounds). 
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7  Lower  Bounds 

In  this  section  we  present  some  lower  bounds  on  the  query  time  and  the  storage  require¬ 
ments  of  various  types  of  range  queries. 

Dynamic  Data  Structures  for  Orthogonal  Range  Queries.  Investigating  the  com¬ 
plexity  of  dynamic  data  structures  that  support  range  queries  as  well  as  insertion  and 
deletion,  Fredman  [23]  presented  some  lower  bounds  for  orthogonal  range  queries.  He 
showed  that  the  time  complexity  of  performing  a  sequence  of  n  intermixed  manipulations 
with  d-dimensional  keys  is  (l{n  Ig'^n).  This  lower  bound  is  tight  for  dynamic  data  structures 
for  orthogonal  range  queries  as  was  shown  by  Fredman  &  Bentley  [3]  and  by  Lueker  [29]. 

Dynamic  Data  Structures  for  Half-plane,  Circular,  and  Parabolic  Ranges  in 
Generalizing  the  above  lower  bounds  to  half-plane  ranges,  circular  ranges,  and  parabolic 
ranges,  Fredman  showed  in  [24]  an  17(71“*  lower  bound  on  the  worst  case  time  complexity 
of  processing  a  sequence  of  n  manipulations.  These  bounds  show  that  the  orthogonal  range 
queries  are  intrinsically  much  easier  the  the  latter  range  queries. 

Static  Data  Structures  for  Circular  Ranges  in  and  Half-Space  Ranges  in  E^. 
In  [34]  Yao  showed  that  any  static  data  structure  for  circular  ranges  in  E^^  that  uses  0(n) 
storage,  must  have  an  fl(n‘)  query  time,  for  some  c  >  0.  Since  circular  range  queries  in  E‘^ 
can  be  reduced  to  half-space  range  queries  in  E^  (see  Yao  [35]),  the  same  bound  applies 
for  half-space  range  queries  in  E^. 

Static  Data  Structures  for  Simplex  Counting  Queries.  In  [7]  Chazelle  proved  a 
family  of  lower  bounds  involving  the  query  time  and  the  storage  complexities  of  simplex 
counting  range  queries.  For  dimension  d  =  2,  if  the  storage  complexity  of  an  algorithm 
is  0(m),  then  the  query  time  complexity  is  shown  to  be  ri{nfy/rn).  More  generally,  in 
dimension  d  >  2,  the  query  time  complexity  is  ^((77/  Ig  n)/m*^‘^).  These  bounds  hold  with 
high  probability  for  a  random  point  set,  and  are  thus  valid  in  the  worst  case  as  well  as  in 
the  average  case. 

These  bounds  imply,  for  instance,  that  if  the  storage  is  restricted  to  be  linear,  then  the 
query  time  for  d  =  2  is  U{y/n).  For  d  >  2  dimensions,  the  query  time  is  n(77*~*/'^/ Ign). 
On  the  other  hand,  for  the  query  time  to  be  polylogarithmic  in  n,  the  storage  has  to  be 
in  E"^,  for  d  >  2.  It  is  important  to  emphasize  that  these  bounds  are  only  for  the 
simplex  counting  problem. 

Static  Data  Structures  for  Simplex  Reporting  Queries.  The  bounds  for  the  simplex 
reporting  problem  are  different  than  those  for  the  simplex  counting  problem,  since  the 
query  time  has  two  components,  the  search  time  and  the  report  time.  For  a  slightly 
weaker  machine  model,  Chazelle  [7]  proved  that  if  the  query  time  is  0{lg^  n  k),  where 
k  is  the  number  of  points  reported  and  6  >  1,  then  the  storage  requirement  is  fl(n‘^“'), 
where  d  is  the  dimension  of  the  space. 

This  lower  bound  is  quite  surprising,  since  it  implies  that  in  E^,  a  polylogarithmic 
query  time  for  the  simplex  "eporting  problem  requires  storage.  For  the  half-plane 

reporting  problem,  however,  there  is  an  algorithm  by  Chazelle,  Guibas,  and  Lee  [9]  that 
uses  0{n)  storage  and  its  query  time  is  0(lg7i  -f  k).  This  demonstrates  that  simplex  range 
queries  are  intrinsically  harder  than  half-space  range  queries. 
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8  Conclusions 


In  this  section  we  summarize  the  three  methods  for  range  queries  and  conclude  with  some 
possible  research  directions. 

Summary  of  Methods.  The  three  methods,  random  sampling,  search-tree  tables,  and 
space-partition  trees  have  different  tradeoffs  between  query  time,  storage  requirement, 
preprocessing,  and  their  applicability  to  different  types  of  range  queries.  When  only  ap¬ 
proximate  solutions  are  required,  random  sampling  seems  to  be  the  method  of  choice. 
Furthermore,  random  sampling  applies  to  a  large  variety  of  ranges,  with  relatively  little 
preprocessing.  Both  search-tree  tables  and  space-partition  trees  can  be  used  for  exact 
counting  and  reporting  queries.  Search-tree  tables,  when  applicable,  are  a  better  choice 
for  reporting  queries,  since  their  search  time  is  polylogarithmic.  Space-partition  trees, 
however,  achieve  linear  storage  and  sublinear  query  time,  which  may  be  the  desired  trade¬ 
off  for  a  variety  of  range  counting  queries.  The  following  table  summarizes  the  important 
characteristics  of  the  three  methods.  The  constants  in  the  table  satisfy  a,  b,c,p,q  >  1  and 
0  <  a  <  1. 


Random 

Search-Tree 

Space-Partition 

Sampling 

Tables 

Trees 

Query  time 

0(1) 

0(lg“n) 

0(n“) 

Storage 

0(1) 

0(n) 

Preprocessing 

0(1) 

0(nP) 

O(n’) 

Exact  answer 

no 

yes 

yes 

Range  spaces 

finite  V-C  dimension 

0(n‘^)  solutions 

finite  V-C  dimension 

Query  type 

counting 

counting/ reporting 

counting/reporting 

The  three  methods  are  not  completely  unrelated.  Both  search-tree  tables  and  space- 
partition  trees  can  be  considered  as  possible  generalizations  of  1 -dimensional  binary  trees. 
A  1-dimensional  binary  tree  can  be  thought  of  as  a  search  tree,  with  solutions  stored  at  its 
leaves,  which  is  the  underlying  scheme  of  search-tree  tables.  Alternatively,  a  1-dimensionaI 
binary  tree  can  be  thought  of  as  a  recursive  partition  of  the  1-dimensional  space  into  two 
half  spaces,  which  is  the  underlying  scheme  of  space-partition  trees.  In  addition,  it  is 
sometimes  possible  to  combine  two  methods,  as  was  demonstrated  for  random  sampling 
and  search-tree  tables,  and  for  random  sampling  and  space-partition  trees.  In  these  cases, 
random  sampling  was  used  in  the  construction  of  the  data  structure  but  not  in  processing 
the  queries. 

Tight  Upper  and  Lower  Bounds.  Almost  all  the  types  of  range  queries  mentioned  in 
this  paper  do  not  have  matching  upper  and  lower  bounds.  The  bounds  are  tight  only  for 
orthogonal  range  queries.  Tighter  bounds  are  certainly  of  both  theoretical  and  practical 
significance. 

Dynamic  Data  Structures  for  Range  Queries.  With  the  exception  of  orthogonal 
range  queries,  no  optimal  dynamic  data  structures  are  known  for  general  range  queries 
in  d  dimensions.  Since  dynamic  data  structures  are  of  primary  significance,  the  design  of 
efficient  dynamic  data  structures  for  range  queries  is  an  important  direction  of  research. 
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Reductions  of  Range  Queries.  As  was  mentioned  in  the  introduction,  some  types  of 
range  queries  can  be  reduced  to  others.  These  reductions  usually  involve  transformations 
of  objects  from  a  d-dimensional  Euclidean  space  to  an  Euclidean  space  of  higher  dimension. 
Other  such  reductions  and  relations  between  different  types  of  ranges  are  of  interest. 
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